Mining ChIP-chip data for transcription factor and cofactor binding sites

نویسندگان

  • Andrew D. Smith
  • Pavel Sumazin
  • Debopriya Das
  • Michael Q. Zhang
چکیده

MOTIVATION Identification of single motifs and motif pairs that can be used to predict transcription factor localization in ChIP-chip data, and gene expression in tissue-specific microarray data. RESULTS We describe methodology to identify de novo individual and interacting pairs of binding site motifs from ChIP-chip data, using an algorithm that integrates localization data directly into the motif discovery process. We combine matrix-enumeration based motif discovery with multivariate regression to evaluate candidate motifs and identify motif interactions. When applied to the HNF localization data in liver and pancreatic islets, our methods produce motifs that are either novel or improved known motifs. All motif pairs identified to predict localization are further evaluated according to how well they predict expression in liver and islets and according to how conserved are the relative positions of their occurrences. We find that interaction models of HNF1 and CDP motifs provide excellent prediction of both HNF1 localization and gene expression in liver. Our results demonstrate that ChIP-chip data can be used to identify interacting binding site motifs. AVAILABILITY Motif discovery programs and analysis tools are available on request from the authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ChIPModule: Systematic Discovery of Transcription Factors and Their Cofactors from ChIP-seq Data

We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data. Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors, ChIPModule can efficiently identify groups of transcription factors, whose binding sites significantly co-occur in the ChIP-seq peak regions. By testing ChIPModu...

متن کامل

Integrative analysis of C. elegans modENCODE ChIP-seq data sets to infer gene regulatory interactions.

The C. elegans modENCODE Consortium has defined in vivo binding sites for a large array of transcription factors by ChIP-seq. In this article, we present examples that illustrate how this compendium of ChIP-seq data can drive biological insights not possible with analysis of individual factors. First, we analyze the number of independent factors bound to the same locus, termed transcription fac...

متن کامل

MYBS: a comprehensive web server for mining transcription factor binding sites in yeast

Correct interactions between transcription factors (TFs) and their binding sites (TFBSs) are of central importance to gene regulation. Recently developed chromatin-immunoprecipitation DNA chip (ChIP-chip) techniques and the phylogenetic footprinting method provide ways to identify TFBSs with high precision. In this study, we constructed a user-friendly interactive platform for dynamic binding s...

متن کامل

Discovery and validation of information theory-based transcription factor and cofactor binding site motifs

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguish...

متن کامل

An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data.

Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technolog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2005